Overview

Dataset statistics

Number of variables18
Number of observations2751
Missing cells15932
Missing cells (%)32.2%
Duplicate rows5
Duplicate rows (%)0.2%
Total size in memory884.4 KiB
Average record size in memory329.2 B

Variable types

Categorical8
Numeric8
DateTime1
Boolean1

Dataset

DescriptionQuality-verified clinical data for JHB_Aurum_009
CreatorHEAT Research Programme
AuthorRP2 Clinical Data Team
URLhttps://github.com/Logic06183/RP2_dataoverview

Variable descriptions

study_sourceStudy identifier
Age (at enrolment)Patient age at study enrollment
SexBiological sex
RaceRacial/ethnic group
enrollment_dateDate of study enrollment
visit_dateDate of clinic visit
primary_datePrimary reference date
study_armStudy treatment arm
study_visitStudy visit number
Antiretroviral Therapy StatusCurrent ART status
BMI (kg/m²)Body Mass Index
weight_kgBody weight in kilograms
height_mHeight in meters
Waist circumference (cm)Waist circumference in centimeters
hip_circumference_cmHip circumference in centimeters
waist_hip_ratioWaist-to-hip ratio
systolic_bp_mmHgSystolic blood pressure
diastolic_bp_mmHgDiastolic blood pressure
heart_rate_bpmHeart rate in beats per minute
Respiratory rate (breaths/min)Respiratory rate
Oxygen saturation (%)Oxygen saturation
body_temperature_celsiusBody temperature in Celsius
CD4 cell count (cells/µL)CD4+ T lymphocyte count
HIV viral load (copies/mL)HIV RNA copies per mL
cd4_percentCD4+ percentage
cd8_count_cells_uLCD8+ T lymphocyte count
cd4_cd8_ratioCD4/CD8 ratio
Hematocrit (%)Hematocrit
hemoglobin_g_dLHemoglobin concentration
White blood cell count (×10³/µL)Total WBC count
Red blood cell count (×10⁶/µL)Total RBC count
Platelet count (×10³/µL)Platelet count
MCV (MEAN CELL VOLUME)Mean corpuscular volume
mch_pgMean corpuscular hemoglobin
mchc_g_dLMean corpuscular hemoglobin concentration
RDWRed cell distribution width
Lymphocyte count (×10⁹/L)Lymphocyte absolute count
Neutrophil count (×10⁹/L)Neutrophil absolute count
Monocyte count (×10⁹/L)Monocyte absolute count
Eosinophil count (×10⁹/L)Eosinophil absolute count
Basophil count (×10⁹/L)Basophil absolute count
lymphocyte_percentLymphocyte percentage
neutrophil_percentNeutrophil percentage
monocyte_percentMonocyte percentage
eosinophil_percentEosinophil percentage
basophil_percentBasophil percentage
ALT (U/L)Alanine aminotransferase
AST (U/L)Aspartate aminotransferase
Alkaline phosphatase (U/L)Alkaline phosphatase
Total bilirubin (mg/dL)Total bilirubin
direct_bilirubin_mg_dLDirect bilirubin
indirect_bilirubin_mg_dLIndirect bilirubin
Albumin (g/dL)Serum albumin
Total protein (g/dL)Total serum protein
ggt_u_LGamma-glutamyl transferase
creatinine_umol_LSerum creatinine (µmol/L)
creatinine_mg_dLSerum creatinine (mg/dL)
creatinine clearanceEstimated creatinine clearance
bun_mg_dLBlood urea nitrogen
urea_mmol_LSerum urea
egfr_ml_minEstimated glomerular filtration rate
Sodium (mEq/L)Serum sodium
Potassium (mEq/L)Serum potassium
chloride_mEq_LSerum chloride
bicarbonate_mEq_LSerum bicarbonate
calcium_mg_dLSerum calcium
magnesium_mg_dLSerum magnesium
phosphate_mg_dLSerum phosphate
total_cholesterol_mg_dLTotal cholesterol
hdl_cholesterol_mg_dLHDL cholesterol
ldl_cholesterol_mg_dLLDL cholesterol
Triglycerides (mg/dL)Triglycerides
vldl_cholesterol_mg_dLVLDL cholesterol
cholesterol_hdl_ratioTotal cholesterol/HDL ratio
fasting_glucose_mmol_LFasting blood glucose (mmol/L)
glucose_mg_dLBlood glucose (mg/dL)
hba1c_percentGlycated hemoglobin
insulin_uIU_mLSerum insulin
lactate_mmol_LBlood lactate
crp_mg_LC-reactive protein
esr_mm_hrErythrocyte sedimentation rate
pt_secondsProthrombin time
inrInternational normalized ratio
aptt_secondsActivated partial thromboplastin time
uric_acid_mg_dLSerum uric acid
ldh_u_LLactate dehydrogenase
ck_u_LCreatine kinase
amylase_u_LSerum amylase
lipase_u_LSerum lipase
climate_daily_mean_tempDaily mean temperature
climate_daily_max_tempDaily maximum temperature
climate_daily_min_tempDaily minimum temperature
climate_temp_anomalyTemperature anomaly from baseline
climate_heat_day_p90Heat day indicator (>90th percentile)
climate_heat_day_p95Heat day indicator (>95th percentile)
climate_heat_stress_indexHeat stress index
climate_humidityRelative humidity
climate_precipitationPrecipitation
climate_seasonSeason
cd4_correction_appliedQuality flag: CD4 corrections applied
final_comprehensive_fix_appliedQuality flag: Comprehensive corrections applied
waist_circ_unit_correction_appliedQuality flag: Waist circumference unit corrected
sa_biomarker_standardsSouth African biomarker reference standards applied

Alerts

study_source has constant value "JHB_Aurum_009"Constant
final_comprehensive_fix_applied has constant value "1.0"Constant
waist_circ_unit_correction_applied has constant value "False"Constant
sa_biomarker_standards has constant value "1.0"Constant
Dataset has 5 (0.2%) duplicate rowsDuplicates
CD4 cell count (cells/µL) is highly overall correlated with cd4_correction_appliedHigh correlation
cd4_correction_applied is highly overall correlated with CD4 cell count (cells/µL)High correlation
climate_daily_max_temp is highly overall correlated with climate_daily_mean_temp and 5 other fieldsHigh correlation
climate_daily_mean_temp is highly overall correlated with climate_daily_max_temp and 5 other fieldsHigh correlation
climate_daily_min_temp is highly overall correlated with climate_daily_max_temp and 5 other fieldsHigh correlation
climate_heat_day_p90 is highly overall correlated with climate_daily_max_temp and 6 other fieldsHigh correlation
climate_heat_day_p95 is highly overall correlated with climate_daily_max_temp and 6 other fieldsHigh correlation
climate_heat_stress_index is highly overall correlated with climate_daily_max_temp and 5 other fieldsHigh correlation
climate_season is highly overall correlated with climate_daily_max_temp and 6 other fieldsHigh correlation
climate_temp_anomaly is highly overall correlated with climate_heat_day_p90 and 2 other fieldsHigh correlation
cd4_correction_applied is highly imbalanced (85.9%)Imbalance
climate_heat_day_p90 is highly imbalanced (69.4%)Imbalance
climate_heat_day_p95 is highly imbalanced (69.4%)Imbalance
CD4 cell count (cells/µL) has 533 (19.4%) missing valuesMissing
HIV viral load (copies/mL) has 2461 (89.5%) missing valuesMissing
climate_daily_mean_temp has 1616 (58.7%) missing valuesMissing
climate_daily_max_temp has 1616 (58.7%) missing valuesMissing
climate_daily_min_temp has 1616 (58.7%) missing valuesMissing
climate_temp_anomaly has 1616 (58.7%) missing valuesMissing
climate_heat_day_p90 has 1616 (58.7%) missing valuesMissing
climate_heat_day_p95 has 1616 (58.7%) missing valuesMissing
climate_heat_stress_index has 1616 (58.7%) missing valuesMissing
climate_season has 1616 (58.7%) missing valuesMissing
HIV viral load (copies/mL) has 246 (8.9%) zerosZeros

Reproduction

Analysis started2025-11-25 05:10:07.797710
Analysis finished2025-11-25 05:10:10.643284
Duration2.85 seconds
Software versionydata-profiling vv4.18.0
Download configurationconfig.json

Variables

study_source
Categorical

Constant 

Study identifier

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size188.1 KiB
JHB_Aurum_009
2751 

Length

Max length13
Median length13
Mean length13
Min length13

Characters and Unicode

Total characters35763
Distinct characters10
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowJHB_Aurum_009
2nd rowJHB_Aurum_009
3rd rowJHB_Aurum_009
4th rowJHB_Aurum_009
5th rowJHB_Aurum_009

Common Values

ValueCountFrequency (%)
JHB_Aurum_0092751
100.0%

Length

2025-11-25T07:10:10.663623image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-11-25T07:10:10.692954image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
jhb_aurum_0092751
100.0%

Most occurring characters

ValueCountFrequency (%)
_5502
15.4%
u5502
15.4%
05502
15.4%
J2751
7.7%
H2751
7.7%
B2751
7.7%
A2751
7.7%
r2751
7.7%
m2751
7.7%
92751
7.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter11004
30.8%
Uppercase Letter11004
30.8%
Decimal Number8253
23.1%
Connector Punctuation5502
15.4%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
J2751
25.0%
H2751
25.0%
B2751
25.0%
A2751
25.0%
Lowercase Letter
ValueCountFrequency (%)
u5502
50.0%
r2751
25.0%
m2751
25.0%
Decimal Number
ValueCountFrequency (%)
05502
66.7%
92751
33.3%
Connector Punctuation
ValueCountFrequency (%)
_5502
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin22008
61.5%
Common13755
38.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
u5502
25.0%
J2751
12.5%
H2751
12.5%
B2751
12.5%
A2751
12.5%
r2751
12.5%
m2751
12.5%
Common
ValueCountFrequency (%)
_5502
40.0%
05502
40.0%
92751
20.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII35763
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
_5502
15.4%
u5502
15.4%
05502
15.4%
J2751
7.7%
H2751
7.7%
B2751
7.7%
A2751
7.7%
r2751
7.7%
m2751
7.7%
92751
7.7%

Age (at enrolment)
Real number (ℝ)

Patient age at study enrollment

Distinct59
Distinct (%)2.1%
Missing6
Missing (%)0.2%
Infinite0
Infinite (%)0.0%
Mean34.426958
Minimum15
Maximum76
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size43.0 KiB
2025-11-25T07:10:10.727383image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum15
5-th percentile20
Q127
median33
Q340
95-th percentile54
Maximum76
Range61
Interquartile range (IQR)13

Descriptive statistics

Standard deviation10.178108
Coefficient of variation (CV)0.29564354
Kurtosis0.24473046
Mean34.426958
Median Absolute Deviation (MAD)7
Skewness0.70885633
Sum94502
Variance103.59388
MonotonicityNot monotonic
2025-11-25T07:10:10.770078image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
31125
 
4.5%
30117
 
4.3%
29116
 
4.2%
28113
 
4.1%
27108
 
3.9%
32106
 
3.9%
26104
 
3.8%
34102
 
3.7%
24101
 
3.7%
3397
 
3.5%
Other values (49)1656
60.2%
ValueCountFrequency (%)
154
 
0.1%
163
 
0.1%
1715
 
0.5%
1824
 
0.9%
1940
 
1.5%
2059
2.1%
2156
2.0%
2273
2.7%
2385
3.1%
24101
3.7%
ValueCountFrequency (%)
761
 
< 0.1%
741
 
< 0.1%
722
 
0.1%
711
 
< 0.1%
701
 
< 0.1%
692
 
0.1%
683
0.1%
671
 
< 0.1%
661
 
< 0.1%
655
0.2%

Sex
Categorical

Biological sex

Distinct2
Distinct (%)0.1%
Missing4
Missing (%)0.1%
Memory size165.9 KiB
Male
1708 
Female
1039 

Length

Max length6
Median length4
Mean length4.7564616
Min length4

Characters and Unicode

Total characters13066
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowFemale
2nd rowFemale
3rd rowMale
4th rowMale
5th rowFemale

Common Values

ValueCountFrequency (%)
Male1708
62.1%
Female1039
37.8%
(Missing)4
 
0.1%

Length

2025-11-25T07:10:10.817060image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-11-25T07:10:10.853355image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
male1708
62.2%
female1039
37.8%

Most occurring characters

ValueCountFrequency (%)
e3786
29.0%
a2747
21.0%
l2747
21.0%
M1708
13.1%
F1039
 
8.0%
m1039
 
8.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter10319
79.0%
Uppercase Letter2747
 
21.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e3786
36.7%
a2747
26.6%
l2747
26.6%
m1039
 
10.1%
Uppercase Letter
ValueCountFrequency (%)
M1708
62.2%
F1039
37.8%

Most occurring scripts

ValueCountFrequency (%)
Latin13066
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e3786
29.0%
a2747
21.0%
l2747
21.0%
M1708
13.1%
F1039
 
8.0%
m1039
 
8.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII13066
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e3786
29.0%
a2747
21.0%
l2747
21.0%
M1708
13.1%
F1039
 
8.0%
m1039
 
8.0%

primary_date
Date

Primary reference date

Distinct447
Distinct (%)16.2%
Missing0
Missing (%)0.0%
Memory size43.0 KiB
Minimum2013-03-14 00:00:00
Maximum2015-08-01 00:00:00
Invalid dates0
Invalid dates (%)0.0%
2025-11-25T07:10:10.892154image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:10.944479image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

CD4 cell count (cells/µL)
Real number (ℝ)

High correlation  Missing 

CD4+ T lymphocyte count

Distinct854
Distinct (%)38.5%
Missing533
Missing (%)19.4%
Infinite0
Infinite (%)0.0%
Mean456.95807
Minimum3
Maximum2703
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size43.0 KiB
2025-11-25T07:10:10.995137image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum3
5-th percentile108.85
Q1272
median416
Q3589
95-th percentile937
Maximum2703
Range2700
Interquartile range (IQR)317

Descriptive statistics

Standard deviation268.47946
Coefficient of variation (CV)0.58753632
Kurtosis7.1691831
Mean456.95807
Median Absolute Deviation (MAD)155
Skewness1.6497118
Sum1013533
Variance72081.223
MonotonicityNot monotonic
2025-11-25T07:10:11.040147image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3509
 
0.3%
3159
 
0.3%
5009
 
0.3%
4679
 
0.3%
4208
 
0.3%
3368
 
0.3%
4438
 
0.3%
3548
 
0.3%
4148
 
0.3%
5648
 
0.3%
Other values (844)2134
77.6%
(Missing)533
 
19.4%
ValueCountFrequency (%)
32
0.1%
61
< 0.1%
81
< 0.1%
101
< 0.1%
151
< 0.1%
161
< 0.1%
201
< 0.1%
211
< 0.1%
281
< 0.1%
291
< 0.1%
ValueCountFrequency (%)
27031
< 0.1%
26092
0.1%
19961
< 0.1%
17811
< 0.1%
17251
< 0.1%
15771
< 0.1%
15681
< 0.1%
15641
< 0.1%
15491
< 0.1%
15081
< 0.1%

HIV viral load (copies/mL)
Real number (ℝ)

Missing  Zeros 

HIV RNA copies per mL

Distinct45
Distinct (%)15.5%
Missing2461
Missing (%)89.5%
Infinite0
Infinite (%)0.0%
Mean20363.586
Minimum0
Maximum2670000
Zeros246
Zeros (%)8.9%
Negative0
Negative (%)0.0%
Memory size43.0 KiB
2025-11-25T07:10:11.085899image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile7860.2
Maximum2670000
Range2670000
Interquartile range (IQR)0

Descriptive statistics

Standard deviation196029.65
Coefficient of variation (CV)9.6264796
Kurtosis145.0072
Mean20363.586
Median Absolute Deviation (MAD)0
Skewness11.783887
Sum5905440
Variance3.8427622 × 1010
MonotonicityNot monotonic
2025-11-25T07:10:11.132510image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=45)
ValueCountFrequency (%)
0246
 
8.9%
85551
 
< 0.1%
3781
 
< 0.1%
24351
 
< 0.1%
64421
 
< 0.1%
137951
 
< 0.1%
2001
 
< 0.1%
311
 
< 0.1%
18981051
 
< 0.1%
1321
 
< 0.1%
Other values (35)35
 
1.3%
(Missing)2461
89.5%
ValueCountFrequency (%)
0246
8.9%
101
 
< 0.1%
311
 
< 0.1%
511
 
< 0.1%
741
 
< 0.1%
821
 
< 0.1%
871
 
< 0.1%
1321
 
< 0.1%
1431
 
< 0.1%
1741
 
< 0.1%
ValueCountFrequency (%)
26700001
< 0.1%
18981051
< 0.1%
6504421
< 0.1%
1643511
< 0.1%
1492471
< 0.1%
1250541
< 0.1%
440111
< 0.1%
385001
< 0.1%
348681
< 0.1%
222761
< 0.1%

cd4_correction_applied
Categorical

High correlation  Imbalance 

Quality flag: CD4 corrections applied

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size161.2 KiB
0.0
2696 
1.0
 
55

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters8253
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.02696
98.0%
1.055
 
2.0%

Length

2025-11-25T07:10:11.173921image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-11-25T07:10:11.207316image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
0.02696
98.0%
1.055
 
2.0%

Most occurring characters

ValueCountFrequency (%)
05447
66.0%
.2751
33.3%
155
 
0.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number5502
66.7%
Other Punctuation2751
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
05447
99.0%
155
 
1.0%
Other Punctuation
ValueCountFrequency (%)
.2751
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common8253
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
05447
66.0%
.2751
33.3%
155
 
0.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII8253
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
05447
66.0%
.2751
33.3%
155
 
0.7%

final_comprehensive_fix_applied
Categorical

Constant 

Quality flag: Comprehensive corrections applied

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size161.2 KiB
1.0
2751 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters8253
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row1.0
3rd row1.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.02751
100.0%

Length

2025-11-25T07:10:11.241687image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-11-25T07:10:11.271934image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
1.02751
100.0%

Most occurring characters

ValueCountFrequency (%)
12751
33.3%
.2751
33.3%
02751
33.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number5502
66.7%
Other Punctuation2751
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
12751
50.0%
02751
50.0%
Other Punctuation
ValueCountFrequency (%)
.2751
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common8253
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
12751
33.3%
.2751
33.3%
02751
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII8253
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
12751
33.3%
.2751
33.3%
02751
33.3%

waist_circ_unit_correction_applied
Boolean

Constant 

Quality flag: Waist circumference unit corrected

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size24.2 KiB
False
2751 
ValueCountFrequency (%)
False2751
100.0%
2025-11-25T07:10:11.296978image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

sa_biomarker_standards
Categorical

Constant 

South African biomarker reference standards applied

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size161.2 KiB
1.0
2751 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters8253
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row1.0
3rd row1.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.02751
100.0%

Length

2025-11-25T07:10:11.330355image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-11-25T07:10:11.361479image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
1.02751
100.0%

Most occurring characters

ValueCountFrequency (%)
12751
33.3%
.2751
33.3%
02751
33.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number5502
66.7%
Other Punctuation2751
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
12751
50.0%
02751
50.0%
Other Punctuation
ValueCountFrequency (%)
.2751
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common8253
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
12751
33.3%
.2751
33.3%
02751
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII8253
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
12751
33.3%
.2751
33.3%
02751
33.3%

climate_daily_mean_temp
Real number (ℝ)

High correlation  Missing 

Daily mean temperature

Distinct11
Distinct (%)1.0%
Missing1616
Missing (%)58.7%
Infinite0
Infinite (%)0.0%
Mean15.451807
Minimum9.356
Maximum23.589
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size43.0 KiB
2025-11-25T07:10:11.388566image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum9.356
5-th percentile9.356
Q113.213
median14.195
Q319.293
95-th percentile23.589
Maximum23.589
Range14.233
Interquartile range (IQR)6.08

Descriptive statistics

Standard deviation3.5385321
Coefficient of variation (CV)0.22900442
Kurtosis-0.30036519
Mean15.451807
Median Absolute Deviation (MAD)0.982
Skewness0.47348153
Sum17537.801
Variance12.521209
MonotonicityNot monotonic
2025-11-25T07:10:11.422013image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
19.293214
 
7.8%
13.213208
 
7.6%
14.195187
 
6.8%
13.868144
 
5.2%
9.35698
 
3.6%
18.20367
 
2.4%
23.58962
 
2.3%
13.65653
 
1.9%
13.31641
 
1.5%
17.79939
 
1.4%
(Missing)1616
58.7%
ValueCountFrequency (%)
9.35698
3.6%
13.213208
7.6%
13.31641
 
1.5%
13.65653
 
1.9%
13.868144
5.2%
14.195187
6.8%
17.79939
 
1.4%
18.20367
 
2.4%
19.293214
7.8%
20.29322
 
0.8%
ValueCountFrequency (%)
23.58962
 
2.3%
20.29322
 
0.8%
19.293214
7.8%
18.20367
 
2.4%
17.79939
 
1.4%
14.195187
6.8%
13.868144
5.2%
13.65653
 
1.9%
13.31641
 
1.5%
13.213208
7.6%

climate_daily_max_temp
Real number (ℝ)

High correlation  Missing 

Daily maximum temperature

Distinct11
Distinct (%)1.0%
Missing1616
Missing (%)58.7%
Infinite0
Infinite (%)0.0%
Mean23.182599
Minimum17.553
Maximum30.083
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size43.0 KiB
2025-11-25T07:10:11.458342image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum17.553
5-th percentile17.553
Q121.474
median22.413
Q326.343
95-th percentile30.083
Maximum30.083
Range12.53
Interquartile range (IQR)4.869

Descriptive statistics

Standard deviation2.9483779
Coefficient of variation (CV)0.12718065
Kurtosis0.15361931
Mean23.182599
Median Absolute Deviation (MAD)1.066
Skewness0.324421
Sum26312.25
Variance8.6929324
MonotonicityNot monotonic
2025-11-25T07:10:11.494990image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
26.343214
 
7.8%
22.23208
 
7.6%
23.023187
 
6.8%
21.347144
 
5.2%
17.55398
 
3.6%
22.41367
 
2.4%
30.08362
 
2.3%
21.47453
 
1.9%
20.76841
 
1.5%
25.839
 
1.4%
(Missing)1616
58.7%
ValueCountFrequency (%)
17.55398
3.6%
20.76841
 
1.5%
21.347144
5.2%
21.47453
 
1.9%
22.23208
7.6%
22.41367
 
2.4%
23.023187
6.8%
25.839
 
1.4%
26.343214
7.8%
26.76922
 
0.8%
ValueCountFrequency (%)
30.08362
 
2.3%
26.76922
 
0.8%
26.343214
7.8%
25.839
 
1.4%
23.023187
6.8%
22.41367
 
2.4%
22.23208
7.6%
21.47453
 
1.9%
21.347144
5.2%
20.76841
 
1.5%

climate_daily_min_temp
Real number (ℝ)

High correlation  Missing 

Daily minimum temperature

Distinct11
Distinct (%)1.0%
Missing1616
Missing (%)58.7%
Infinite0
Infinite (%)0.0%
Mean7.5503286
Minimum2.343
Maximum14.954
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size43.0 KiB
2025-11-25T07:10:11.528943image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum2.343
5-th percentile2.343
Q13.763
median6.616
Q311.253
95-th percentile14.954
Maximum14.954
Range12.611
Interquartile range (IQR)7.49

Descriptive statistics

Standard deviation4.0456474
Coefficient of variation (CV)0.53582401
Kurtosis-1.0855077
Mean7.5503286
Median Absolute Deviation (MAD)2.853
Skewness0.50562955
Sum8569.623
Variance16.367263
MonotonicityNot monotonic
2025-11-25T07:10:11.565737image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
11.253214
 
7.8%
3.763208
 
7.6%
4.56187
 
6.8%
7.436144
 
5.2%
2.34398
 
3.6%
14.7967
 
2.4%
14.95462
 
2.3%
6.03453
 
1.9%
6.61641
 
1.5%
10.49339
 
1.4%
(Missing)1616
58.7%
ValueCountFrequency (%)
2.34398
3.6%
3.763208
7.6%
4.56187
6.8%
6.03453
 
1.9%
6.61641
 
1.5%
7.436144
5.2%
10.49339
 
1.4%
11.253214
7.8%
13.96822
 
0.8%
14.7967
 
2.4%
ValueCountFrequency (%)
14.95462
 
2.3%
14.7967
 
2.4%
13.96822
 
0.8%
11.253214
7.8%
10.49339
 
1.4%
7.436144
5.2%
6.61641
 
1.5%
6.03453
 
1.9%
4.56187
6.8%
3.763208
7.6%

climate_temp_anomaly
Real number (ℝ)

High correlation  Missing 

Temperature anomaly from baseline

Distinct11
Distinct (%)1.0%
Missing1616
Missing (%)58.7%
Infinite0
Infinite (%)0.0%
Mean7.1579163
Minimum3.618
Maximum10.271
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size43.0 KiB
2025-11-25T07:10:11.598063image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum3.618
5-th percentile3.618
Q16.505
median7.489
Q39.042
95-th percentile10.271
Maximum10.271
Range6.653
Interquartile range (IQR)2.537

Descriptive statistics

Standard deviation2.2633511
Coefficient of variation (CV)0.31620252
Kurtosis-0.9276673
Mean7.1579163
Median Absolute Deviation (MAD)1.553
Skewness-0.39512055
Sum8124.235
Variance5.1227584
MonotonicityNot monotonic
2025-11-25T07:10:11.631549image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
7.489214
 
7.8%
3.654208
 
7.6%
7.602187
 
6.8%
10.271144
 
5.2%
6.91898
 
3.6%
3.61867
 
2.4%
9.04262
 
2.3%
9.83953
 
1.9%
7.91341
 
1.5%
10.02539
 
1.4%
(Missing)1616
58.7%
ValueCountFrequency (%)
3.61867
 
2.4%
3.654208
7.6%
6.50522
 
0.8%
6.91898
3.6%
7.489214
7.8%
7.602187
6.8%
7.91341
 
1.5%
9.04262
 
2.3%
9.83953
 
1.9%
10.02539
 
1.4%
ValueCountFrequency (%)
10.271144
5.2%
10.02539
 
1.4%
9.83953
 
1.9%
9.04262
 
2.3%
7.91341
 
1.5%
7.602187
6.8%
7.489214
7.8%
6.91898
3.6%
6.50522
 
0.8%
3.654208
7.6%

climate_heat_day_p90
Categorical

High correlation  Imbalance  Missing 

Heat day indicator (>90th percentile)

Distinct2
Distinct (%)0.2%
Missing1616
Missing (%)58.7%
Memory size167.5 KiB
0.0
1073 
1.0
 
62

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters3405
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.01073
39.0%
1.062
 
2.3%
(Missing)1616
58.7%

Length

2025-11-25T07:10:11.670110image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-11-25T07:10:11.703253image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
0.01073
94.5%
1.062
 
5.5%

Most occurring characters

ValueCountFrequency (%)
02208
64.8%
.1135
33.3%
162
 
1.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2270
66.7%
Other Punctuation1135
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
02208
97.3%
162
 
2.7%
Other Punctuation
ValueCountFrequency (%)
.1135
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common3405
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
02208
64.8%
.1135
33.3%
162
 
1.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII3405
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
02208
64.8%
.1135
33.3%
162
 
1.8%

climate_heat_day_p95
Categorical

High correlation  Imbalance  Missing 

Heat day indicator (>95th percentile)

Distinct2
Distinct (%)0.2%
Missing1616
Missing (%)58.7%
Memory size167.5 KiB
0.0
1073 
1.0
 
62

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters3405
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.01073
39.0%
1.062
 
2.3%
(Missing)1616
58.7%

Length

2025-11-25T07:10:11.736408image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-11-25T07:10:11.768400image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
0.01073
94.5%
1.062
 
5.5%

Most occurring characters

ValueCountFrequency (%)
02208
64.8%
.1135
33.3%
162
 
1.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2270
66.7%
Other Punctuation1135
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
02208
97.3%
162
 
2.7%
Other Punctuation
ValueCountFrequency (%)
.1135
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common3405
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
02208
64.8%
.1135
33.3%
162
 
1.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII3405
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
02208
64.8%
.1135
33.3%
162
 
1.8%

climate_heat_stress_index
Real number (ℝ)

High correlation  Missing 

Heat stress index

Distinct11
Distinct (%)1.0%
Missing1616
Missing (%)58.7%
Infinite0
Infinite (%)0.0%
Mean18.312848
Minimum13.428
Maximum27.393
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size43.0 KiB
2025-11-25T07:10:11.795222image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum13.428
5-th percentile13.639
Q114.306
median17.923
Q321.523
95-th percentile27.393
Maximum27.393
Range13.965
Interquartile range (IQR)7.217

Descriptive statistics

Standard deviation3.536553
Coefficient of variation (CV)0.19311867
Kurtosis0.20907555
Mean18.312848
Median Absolute Deviation (MAD)3.6
Skewness0.58250383
Sum20785.083
Variance12.507207
MonotonicityNot monotonic
2025-11-25T07:10:11.829955image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
21.523214
 
7.8%
19.275208
 
7.6%
17.347187
 
6.8%
14.306144
 
5.2%
13.63998
 
3.6%
17.92367
 
2.4%
27.39362
 
2.3%
13.42853
 
1.9%
15.72141
 
1.5%
19.95839
 
1.4%
(Missing)1616
58.7%
ValueCountFrequency (%)
13.42853
 
1.9%
13.63998
3.6%
14.306144
5.2%
15.72141
 
1.5%
17.347187
6.8%
17.92367
 
2.4%
19.275208
7.6%
19.95839
 
1.4%
21.523214
7.8%
22.52622
 
0.8%
ValueCountFrequency (%)
27.39362
 
2.3%
22.52622
 
0.8%
21.523214
7.8%
19.95839
 
1.4%
19.275208
7.6%
17.92367
 
2.4%
17.347187
6.8%
15.72141
 
1.5%
14.306144
5.2%
13.63998
3.6%

climate_season
Categorical

High correlation  Missing 

Season

Distinct4
Distinct (%)0.4%
Missing1616
Missing (%)58.7%
Memory size170.8 KiB
Spring
609 
Winter
295 
Summer
129 
Autumn
102 

Length

Max length6
Median length6
Mean length6
Min length6

Characters and Unicode

Total characters6810
Distinct characters12
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowAutumn
2nd rowSpring
3rd rowWinter
4th rowSpring
5th rowSpring

Common Values

ValueCountFrequency (%)
Spring609
 
22.1%
Winter295
 
10.7%
Summer129
 
4.7%
Autumn102
 
3.7%
(Missing)1616
58.7%

Length

2025-11-25T07:10:11.872350image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-11-25T07:10:11.907524image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
spring609
53.7%
winter295
26.0%
summer129
 
11.4%
autumn102
 
9.0%

Most occurring characters

ValueCountFrequency (%)
r1033
15.2%
n1006
14.8%
i904
13.3%
S738
10.8%
p609
8.9%
g609
8.9%
e424
6.2%
t397
 
5.8%
m360
 
5.3%
u333
 
4.9%
Other values (2)397
 
5.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter5675
83.3%
Uppercase Letter1135
 
16.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r1033
18.2%
n1006
17.7%
i904
15.9%
p609
10.7%
g609
10.7%
e424
7.5%
t397
 
7.0%
m360
 
6.3%
u333
 
5.9%
Uppercase Letter
ValueCountFrequency (%)
S738
65.0%
W295
 
26.0%
A102
 
9.0%

Most occurring scripts

ValueCountFrequency (%)
Latin6810
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
r1033
15.2%
n1006
14.8%
i904
13.3%
S738
10.8%
p609
8.9%
g609
8.9%
e424
6.2%
t397
 
5.8%
m360
 
5.3%
u333
 
4.9%
Other values (2)397
 
5.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII6810
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
r1033
15.2%
n1006
14.8%
i904
13.3%
S738
10.8%
p609
8.9%
g609
8.9%
e424
6.2%
t397
 
5.8%
m360
 
5.3%
u333
 
4.9%
Other values (2)397
 
5.8%

Interactions

2025-11-25T07:10:10.051330image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:07.950711image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:08.393537image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:08.739265image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:08.982502image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:09.254572image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:09.520277image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:09.785678image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:10.083443image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:08.020218image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:08.504765image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:08.767835image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:09.015103image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:09.286843image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:09.553217image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:09.817353image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:10.118569image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:08.122909image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:08.535810image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:08.799328image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:09.050019image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:09.320666image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:09.586228image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:09.852939image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:10.148505image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:08.198011image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:08.567058image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:08.829406image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:09.080198image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:09.350279image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:09.616341image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:09.882504image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:10.183914image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:08.256375image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:08.602137image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:08.861650image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:09.116007image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:09.384643image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:09.649779image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:09.918496image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:10.218869image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:08.288436image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:08.637083image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:08.892755image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:09.150675image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:09.418894image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:09.685181image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:09.951075image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:10.252717image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:08.322362image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:08.670628image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:08.922169image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:09.185658image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:09.452235image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:09.717697image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:09.985330image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:10.287563image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:08.357382image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:08.704380image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:08.951261image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:09.218975image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:09.486592image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:09.751220image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-25T07:10:10.016352image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Correlations

2025-11-25T07:10:11.937328image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Age (at enrolment)CD4 cell count (cells/µL)HIV viral load (copies/mL)Sexcd4_correction_appliedclimate_daily_max_tempclimate_daily_mean_tempclimate_daily_min_tempclimate_heat_day_p90climate_heat_day_p95climate_heat_stress_indexclimate_seasonclimate_temp_anomaly
Age (at enrolment)1.000-0.130-0.0880.2000.0540.0120.0210.0280.0520.0520.0250.041-0.020
CD4 cell count (cells/µL)-0.1301.0000.0310.1681.0000.0380.0420.0180.0000.0000.0080.0000.033
HIV viral load (copies/mL)-0.0880.0311.0000.0990.4880.0820.0970.0740.0000.0000.0350.056-0.010
Sex0.2000.1680.0991.0000.0000.0000.0000.0420.0000.0000.0270.0000.000
cd4_correction_applied0.0541.0000.4880.0001.0000.0000.0000.0000.0000.0000.0000.0000.000
climate_daily_max_temp0.0120.0380.0820.0000.0001.0000.8830.6470.9980.9980.8590.760-0.037
climate_daily_mean_temp0.0210.0420.0970.0000.0000.8831.0000.9000.9980.9980.6720.7380.220
climate_daily_min_temp0.0280.0180.0740.0420.0000.6470.9001.0000.6090.6090.5370.9410.266
climate_heat_day_p900.0520.0000.0000.0000.0000.9980.9980.6091.0000.9910.9970.6700.998
climate_heat_day_p950.0520.0000.0000.0000.0000.9980.9980.6090.9911.0000.9970.6700.998
climate_heat_stress_index0.0250.0080.0350.0270.0000.8590.6720.5370.9970.9971.0000.933-0.295
climate_season0.0410.0000.0560.0000.0000.7600.7380.9410.6700.6700.9331.0000.785
climate_temp_anomaly-0.0200.033-0.0100.0000.000-0.0370.2200.2660.9980.998-0.2950.7851.000

Missing values

2025-11-25T07:10:10.338664image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
A simple visualization of nullity by column.
2025-11-25T07:10:10.515693image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2025-11-25T07:10:10.594288image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

study_sourceAge (at enrolment)Sexprimary_dateCD4 cell count (cells/µL)HIV viral load (copies/mL)cd4_correction_appliedfinal_comprehensive_fix_appliedwaist_circ_unit_correction_appliedsa_biomarker_standardsclimate_daily_mean_tempclimate_daily_max_tempclimate_daily_min_tempclimate_temp_anomalyclimate_heat_day_p90climate_heat_day_p95climate_heat_stress_indexclimate_season
3377JHB_Aurum_00924.0Female2014-02-15369.00.00.01.0False1.0NaNNaNNaNNaNNaNNaNNaNNaN
3378JHB_Aurum_00938.0Female2014-04-09701.0NaN0.01.0False1.0NaNNaNNaNNaNNaNNaNNaNNaN
3379JHB_Aurum_00921.0Male2014-08-12654.0NaN0.01.0False1.0NaNNaNNaNNaNNaNNaNNaNNaN
3380JHB_Aurum_00929.0Male2014-04-29350.0NaN0.01.0False1.0NaNNaNNaNNaNNaNNaNNaNNaN
3381JHB_Aurum_00935.0Female2013-04-29324.00.00.01.0False1.017.79925.80010.49310.0250.00.019.958Autumn
3382JHB_Aurum_00922.0Male2014-06-26276.0NaN0.01.0False1.0NaNNaNNaNNaNNaNNaNNaNNaN
3383JHB_Aurum_00938.0Female2013-11-19NaNNaN0.01.0False1.019.29326.34311.2537.4890.00.021.523Spring
3384JHB_Aurum_009NaNMale2014-09-08NaNNaN0.01.0False1.0NaNNaNNaNNaNNaNNaNNaNNaN
3385JHB_Aurum_00922.0Female2013-08-24525.0NaN0.01.0False1.09.35617.5532.3436.9180.00.013.639Winter
3386JHB_Aurum_00942.0Male2014-03-24287.0NaN0.01.0False1.0NaNNaNNaNNaNNaNNaNNaNNaN
study_sourceAge (at enrolment)Sexprimary_dateCD4 cell count (cells/µL)HIV viral load (copies/mL)cd4_correction_appliedfinal_comprehensive_fix_appliedwaist_circ_unit_correction_appliedsa_biomarker_standardsclimate_daily_mean_tempclimate_daily_max_tempclimate_daily_min_tempclimate_temp_anomalyclimate_heat_day_p90climate_heat_day_p95climate_heat_stress_indexclimate_season
6118JHB_Aurum_00923.0Male2013-07-17174.0NaN0.01.0False1.013.86821.3477.43610.2710.00.014.306Winter
6119JHB_Aurum_00936.0Male2013-06-06110.0NaN0.01.0False1.013.65621.4746.0349.8390.00.013.428Winter
6120JHB_Aurum_00929.0Male2014-06-17393.00.00.01.0False1.0NaNNaNNaNNaNNaNNaNNaNNaN
6121JHB_Aurum_00934.0Female2014-02-03202.0NaN0.01.0False1.0NaNNaNNaNNaNNaNNaNNaNNaN
6122JHB_Aurum_00934.0Female2014-04-2931.0NaN0.01.0False1.0NaNNaNNaNNaNNaNNaNNaNNaN
6123JHB_Aurum_00931.0Male2014-04-23365.0NaN0.01.0False1.0NaNNaNNaNNaNNaNNaNNaNNaN
6124JHB_Aurum_00931.0Female2013-08-27586.0NaN0.01.0False1.09.35617.5532.3436.9180.00.013.639Winter
6125JHB_Aurum_00965.0Male2014-08-14409.0NaN0.01.0False1.0NaNNaNNaNNaNNaNNaNNaNNaN
6126JHB_Aurum_00928.0Male2014-08-04455.0NaN0.01.0False1.0NaNNaNNaNNaNNaNNaNNaNNaN
6127JHB_Aurum_00923.0Male2013-11-16300.0NaN0.01.0False1.019.29326.34311.2537.4890.00.021.523Spring

Duplicate rows

Most frequently occurring

study_sourceAge (at enrolment)Sexprimary_dateCD4 cell count (cells/µL)HIV viral load (copies/mL)cd4_correction_appliedfinal_comprehensive_fix_appliedwaist_circ_unit_correction_appliedsa_biomarker_standardsclimate_daily_mean_tempclimate_daily_max_tempclimate_daily_min_tempclimate_temp_anomalyclimate_heat_day_p90climate_heat_day_p95climate_heat_stress_indexclimate_season# duplicates
0JHB_Aurum_00923.0Male2013-07-15NaNNaN0.01.0False1.013.86821.3477.43610.2710.00.014.306Winter2
1JHB_Aurum_00932.0Female2014-03-29NaNNaN0.01.0False1.0NaNNaNNaNNaNNaNNaNNaNNaN2
2JHB_Aurum_00937.0Female2014-10-28NaNNaN0.01.0False1.0NaNNaNNaNNaNNaNNaNNaNNaN2
3JHB_Aurum_00939.0Male2014-08-12NaNNaN0.01.0False1.0NaNNaNNaNNaNNaNNaNNaNNaN2
4JHB_Aurum_00949.0Male2014-04-02NaNNaN0.01.0False1.0NaNNaNNaNNaNNaNNaNNaNNaN2